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Abstract. 

Human coronavirus (HCoV) NL63 is recognized as a common cause of upper respiratory infections and influenza¬ 
like illness. In screening children with acute undifferentiated febrile illness in a school cohort in rural Haiti, we 
identified HCoV-NL63 in blood samples from four children. Cases clustered over an 11-day period; children did not 
have respiratory symptoms, but two had gastrointestinal complaints. On phylogenetic analysis, the Haitian HCoV - 
NL63 strains cluster together in a highly supported monophyletic clade linked most closely with recently reported 
strains from Malaysia; two respiratory HCoV-NL63 strains identified in north Florida in the same general period 
form a separate clade, albeit again with close linkages with the Malaysian strains. Our data highlight the variety of 
presentations that may be seen with HCoV-NL63, and underscore the apparent ease with which CoV strains move 
among countries, with our data consistent with recurrent introduction of strains into the Caribbean (Haiti and 
Florida) from Asia. 

Human coronaviruses (HCoVs) are being recognized with increasing frequency globally as a 
cause of human respiratory infections. 1-8 Strains are also associated with infections in animals, 
with current data suggesting that bats are a primary reservoir and point of origin for HCoVs. 9 ’ 10 
Initially described in the 1960s, interest in the coronaviruses increased substantially in the early 
2000 with the emergence of severe acute respiratory syndrome (SARS), 11 followed by the 
Middle East respiratory syndrome (MERS) in 2012. 12 Other than SARS and MERS, which have 
tended to occur in more delimited epidemics, four coronaviruses are currently recognized as 
being endemic in human populations: HCoV-229E, HCoV-OC43, HCoV-NL63, and HCoV- 
HKU1. 1-8 

Our group works closely with the school clinic that serves four schools operated by the 
Christianville Foundation in the Gressier/Leogane region of Haiti, some 20 miles west of Port- 
au-Prince. The schools have a total of approximately 1,250 students, from prekindergarten to 
grade 12. 13 Because of our interest in monitoring arboviral transmission within this population, 


Copyright 2016 by the American Society of Tropical Medicine and Hygiene 


we have protocols in place for the collection of diagnostic blood samples from children 
presenting to the school clinic with acute undifferentiated febrile illness (i.e., febrile illness with 
no localizing signs, such as would be expected with pneumonia, upper respiratory infections, 
urinary tract infections). The protocol for sample collection was approved by the University of 
Florida Institutional Review Board (IRB) and the Haitian National IRB, and written parental 
informed consent was obtained from parents or guardians of all study participants. 

CASE REPORTS 

Between May 2014 and February 2015, blood samples were obtained at the school clinic 
from a total of 177 children who met criteria for the diagnosis of acute undifferentiated febrile 
illness. HCoV-NL63 was identified in samples from four case patients seen between January 16 
and January 27, 2015. Ages ranged from 3 to 10 years (mean 6 years). All presented with 
subjective symptoms of fever, although only two were febrile when examined (39°C and 38.5°C, 
respectively). Two complained of headache, whereas two complained of “abdominal flu.” In 
keeping with the criteria for enrollment in the study, no respiratory symptoms or symptoms 
consistent with croup were reported. Routine stool cultures and examination for ova and 
parasites were performed on stool samples from all four children, as previously described 13 : 
Giardia was identified in the stool sample of one of the children with abdominal complaints, and 
Blastocystis was identified in one of the children with no gastrointestinal (GI) complaints. No 
bacterial pathogens were identified on stool culture. Malaria screens on all four children were 
negative. In all instances, illness was mild and self-limited, and children recovered without 
sequelae. 

Plasma samples were cultured using cell lines and conditions as previously described, 14 ’ 15 
and as reported in detail in the Supplemental Information that accompanies this publication. In 
brief, aliquots of plasma (25-100 pL) from each patient were inoculated in duplicate into five 
different cell lines (A549, LLC-MK2, MDCK, MRC5, and Vero E6). Cell cultures from each of 
the four patients displayed cytopathic effects (CPEs) that were clearly different from those 
observed in cells in which alpha-, flavi-, and other viruses were isolated (Figure 1). The CPE 
were most obvious in LLC-MK2 cells and less so in Vero E6 incubated at either 33°C or 37°C. 

Primary screens of spent culture from these cell cultures by reverse transcription polymerase 
chain reaction (RT-PCR) were negative for arena-, alpha-, entero-, and flavi virus viral genomic 
RNAs (vRNAs); however, they were positive for HCoV-NL63 using the GenMark eSensor XT-8 
RVP system (eSensor RVP; GenMark Diagnostics, Inc., Carlsbad, CA). The agent isolated in 
these cultures was subsequently confirmed as HCoV-NL63 by RT-PCR and sequencing of the 
amplicons and the virus genome. Similar methods were used to obtain sequence data for HCoV- 
NL63 from two positive respiratory samples collected in July and September 2015, respectively, 
at University of Florida Health/Shands Hospital in Gainesville, FL. GenBank accession numbers: 
HCoV-NL63/Haiti: KT266906; KX179494- KX179499. HCoV-NL63/Florida: KT381875.1; 
KU521535.1. 

A multiple sequence alignment was assembled including 192 HCoV-NL63 spike gene region 
reference sequences, downloaded from NCBI (see Supplemental Table 1 for details about the 
reference sequences), plus the four Haitian sequences and the two sequences from University of 
Florida Health. The locations of the reference sequences considered for the analysis were 
Belgium, China, Ghana, Hong Kong, Japan, Malaysia, Netherlands, Sweden, Thailand, and the 
United States. The evolutionary model was chosen as the best-fitting nucleotide substitution 



model in accordance with the results of the hierarchical likelihood ratio test implemented with 
the Modeltest software version 3.7. 16 See supplemental Information for a complete description of 
phylogenetic methods. 

Bayesian factor (BF) analysis showed that a significant better fit for the relaxed rather than 
the strict molecular clock model (LnBF > 15 in favor of the former). Under the relaxed clock, the 
BF analysis also showed that the nonparametric Bayesian skyline plot was the best-fitting 
demographic model (LnBF >53 for each comparison). The estimated mean evolutionary rate 
was 7.02 x 10~ 4 substitutions/site/year (95% high posterior density [HPD] 3.97 x 10 -4 —1.09 x 
10~ 3 ). The Maximum Clade Credibility tree shows three main supported clades (Figure 2) with 
no particular geographic structure, although it is clear that the older lineages, close to the root of 
the tree are from United States with a most recent common ancestor dating back to the mid- 
1970s (a tree with reference labels is provided in Supplemental Figure 1). 

The four new Haitian isolates belonged to genotype C and clustered within a well-supported 
(P > 0.95) sub-clade with a most recent common ancestor dating back to 2013 (95% high 
posterior density intervals 2012-2014). The two Florida sequences also clustered with genotype 
C sequences and formed a separate but highly supported sub-clade (P > 0.95), with a most recent 
common ancestor dating back to 2014 (95% HPD intervals 2013.5-2014.5). The Haitian and 
Florida isolates and a strain from Malaysia are part of a larger sub-clade branching off 
independently from several genotype C Malaysian sequences, which would suggest an Asian 
origin for the new strains, although the sub-clade does not have strong statistical support (P < 
0.9). Overall, the data are consistent with at least two recent and independent introductions of 
HCoV-NL63 virus in the region (one in Haiti, one in north Florida), possibly originating from 
east Asia. 


DISCUSSION 

HCoV-NL63 is widely distributed in human populations, and has been isolated from patients 
with symptoms ranging from those of a mild “common cold” to influenza-like illness, and, 
rarely, pneumonia and more severe lower respiratory infections. 8 Cases tend to cluster, often in 
winter/late winter months in temperate areas, although there is a wide variation in seasonality in 
different parts of the world. In comparison to the other common HCoV groups, NL63 has been 
reported as the most common cause of croup in infants < 6 months of age. 7 It is also commonly 
seen as a coinfection with other known pathogenic viruses (38% of total cases in one study were 
infected with two or more viruses 7 ), and, in a study in Ghana, which compared children with 
respiratory symptoms with control children without such symptoms, HCoV-NL63 was 
significantly more common among control children than case children (8.5% versus 6.8%, P 
0.022). HCoV (including HCoV-NL63) have also been identified in stool samples from children 
with acute gastroenteritis; however, 82% of patients from whom HCoV were isolated also had 
either norovirus or rotavirus in the stool sample. 18 Taken together, these data paint a picture of a 
widely distributed virus, present in the respiratory and GI tract, with uncertain pathogenicity, 
possibly due to inherently low virulence and/or the impact of high levels of prior immunity in the 
general population. 

In this study, we isolated HCoV-NL63 from a small cluster of children with febrile illness, 
without respiratory symptoms, in January in rural Haiti. As our primary goal was to screen for 
arboviruses, we specifically excluded children with respiratory symptoms from sample 
collection, so we cannot be certain how many of this latter group might also have been infected 



with/carrying HCoV-NL63. What was unexpected was the isolation of this specific coronavirus 
group from blood, which, to our knowledge, has not been previously reported. However, viremia 
is not totally unexpected: HCoV-NL63 is capable of infecting human kidney cells in tissue 
culture 19 by using the same cell receptor (ACE2 receptor) as SARS-CoV, which causes systemic 
illness, and is shed in respiratory secretions, urine, and stools, with presumptive hematogenous 
spread. The clinical significance of the viremia in our specific cases is uncertain. All case 
patients were febrile, and two had GI complaints (although one of the two with GI complaints 
also had Giardia on stool examination). Our findings indicate that viremia with HCoV-NL63 
does occur, and raise the possibility that it can cause a mild febrile illness in otherwise healthy 
children, without respiratory manifestations, but, possibly, with diarrhea. 

At a phylogenetic level, our Haitian isolates were most closely linked with isolates from 
Malaysia. We also sequenced two HCoV-NL63 isolates from respiratory samples at University 
of Florida Health in Gainesville: they clustered together within a separate monophyletic clade, 
with a separate link back to Malaysian strains. These findings underscore the apparent ease with 
which coronavirus strains can spread at a global level, findings in keeping with the observed 
continental jumps of both SARS-CoV and MERS-CoV. 20 
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Figure 1. LLC-MK2 cells. Noninoculated cells (A) appear crowded 13 days postseed; no spaces are present 
between the cells and the nuclei have prominent nucleoli. (B) Cells 13 days postinoculation with plasma sample four 
appear different: the nuclei of the infected cells appear larger, lack prominent nucleoli, and have visibly darker 
nuclear borders (black arrows), and clearings due to detachment of dead cells are evident, as are a few refractile 
floating dead cells. The virus grew at both 33°C and 37°C; this figure shows virus in cells grown at 33°C. The 
appearance of the nuclei of the infected LLC-MK2 cells is consistent with HCoV-NL63 infections. Localization to 
the nucleolus is a common feature of coronavirus nucleoproteins, resulting in the accumulation of cells in the M- 
phase of the cell cycle (nucleoli are absent in dividing cells), and the formation in syncytia in a fraction of the cells. 
Moreover, immature HCoV-NL63 particles form in the RER surrounding the nucleus, and the dark border 
surrounding the enlarged nuclei in Figure IB are likely due to cellular changes in the RER of the infected cells. 
Representative photomicrographs of cytopathic effects of other viral species are shown in Supplemental Figures 2— 
4. This figure appears in color at www.ajtmh.org. 


FIGURE 2. Maximum Clade Credibility tree of HCoV-NL63 spike gene region. Branches are colored according to 
the legend to the left where each color represents the geographic location of the sampled sequence (tip branches), as 
well as of the ancestral lineage (internal branches) inferred by Bayesian phylogeography. The tree was scaled in 
time by enforcing a relaxed molecular clock (see Supplemental Information). One asterisk along the branch 
represents significant statistical support (posterior probability > 90%). Sequence labels for the new strains are 
shown. This figure appears in color at www.ajtmh.org. 



SUPPLEMENTAL INFORMATION 


Viral culture, identification, and sequencing. 

Cell cultures. 

Cell lines A549 (CCL-185), LLC-MK2 (CCL-7), MDCK, (CCL-34), MRC5, and Vero E6 
(CRL-1586) were obtained from the American Type Culture Collection (Manassas, VA), and 
were propagated as monolayers at 37°C and 5% CO 2 in Advanced Dulbecco’s Modified Eagle’s 
Medium (aDMEM) (Invitrogen Corp., Carlsbad, CA) or Eagle’s Minimal Essential Medium 
(EMEM) (Invitrogen Corp.), as appropriate per cell line. aDMEM and EMEM were 
supplemented with 2 mM L-Alanyl-L-Glutamine (GlutaMAX™; Invitrogen Corp.), antibiotics 
(PSN; 50 pg/mL penicillin, 50 pg/tnL streptomycin, 100 pg/mL neomycin [Invitrogen Corp.]), 
and 10% (v/v) low IgG, heat-inactivated gamma-irradiated fetal bovine serum (FBS) (HyClone, 
Logan, UT). Additionally, sodium pyruvate (Invitrogen Corp.) and nonessential amino acids 
(HyClone) were added to EMEM. Before seed stocks were prepared, the cell lines were 
propagated in growth media with plasmocin (Invivogen, San Diego, CA) for 2 weeks to reduce 
the chances of mycoplasma contamination, then for a minimum of 2 weeks in the absence of 
antibiotics to determine whether fast-growing microbial contaminants were present or abnormal 
morphological changes would occur (associated with intracellular mycoplasma). Following 2-3 
weeks of propagation without antibiotics, the plasmocin-treated cell lines were tested by 
polymerase chain reaction (PCR) to confirm an absence of mycoplasma DNA. 1 

Virus isolation. 

Aliquots of plasma (25-100 pL) from febrile patients that had tested negative for 
Chikungunya virus RNA by reverse transcription PCR (RT-PCR) (details to be presented 
elsewhere) were inoculated onto duplicate sets of cells grown in 25 cm 2 rectangular canted-neck 
cell culture flasks with vent caps (Cat no. 430639, Coming Incorporated, Corning, NY). A total 
of 10 flasks were used per plasma sample. Cell culture flasks were used so that plasma could be 
inoculated onto a relatively wide surface area; otherwise, the complex mix of biomolecules in 
human plasma/serum can induce nonspecific cytopathic effects (CPEs) that are mistaken for 
virus-induced CPE. Sets consisting of five different cell lines (A549, LLC-MK2, MDCK, 
MRC5, and Vero E6) were used to increase the chance of isolating a wide array of viruses. One 
set was inoculated at 37°C, the other at 33°C. All cells were in complete medium with serum, 
except for MDCK cells at 33°C, which were in serum-free medium containing 2 pg/mL of L-l- 
tosylamide-2-phenylethyl chloromethyl ketone (TPCK)-treated trypsin. 2 Noninoculated cells 
were held in parallel with the inoculated cells and served as negative controls. All the cells were 
observed daily for development of virus-induced CPE using an inverted microscope with phase- 
contrast optics, and refed every 3 days with complete medium containing either 4% FBS (or 
serum-free medium with TPCK trypsin). The cells were maintained and observed for 1 month 
before being considered negative for virus isolation. 

Virus screens. 

This work was focused on the isolation and/or detection of alpha- and flaviviruses, and PCR- 
based methods for the detection of the nucleic acids of those viruses (to be presented elsewhere) 
were used for primary screens. A limited number of PCR-based methods were also used for 
secondary tests; these were for the detection of arenaviruses and enteroviruses, which are viruses 



that could be in serum/plasma in febrile patients. Furthermore, the GenMark multiplex 
respiratory PCR eSensor XT-8 Respiratory Viral Panel (eSensor RVP; GenMark Diagnostics, 
Inc., Carlsbad, CA) was used to screen for respiratory viruses according to instructions from the 
manufacturer. The system detects the genomic material of influenza A virus (including subtypes 
HI and H3), influenza A virus 2009 H1N1, influenza B virus, respiratory syncytial viruses A and 
B, parainfluenza viruses 1, 2, 3, and 4, human metapneumovirus, adenoviruses B/E and C, 
human coronaviruses (HCoVs) (-229E, -NL63, -HKU1, -OC43), and human rhinoviruses A and 
B. Briefly, in the case of viral genomic RNA (vRNA), the extracted nucleic acid is reverse 
transcribed and amplified using viral-specific primers with an RT-PCR enzyme mix. The 
amplified DNA is converted to single-stranded DNA via exonuclease digestion and is combined 
with a signal buffer containing ferrocene-labeled signal probes that are specific for the different 
viral targets. A signal in nanoAmperes (nA) is provided; signals higher than a threshold value are 
considered positive. 

RT-PCR tests for confirmation ofHCoV-NL63. 

Following detection of HCoV-NL63 vRNA by the GenMark eSensor XT-8 RVP, 
confirmation was attained using methods outlined in reference. 3 Briefly, vRNA was extracted 
from spent cell media using a QIAamp Viral RNA kit (QIAGEN, Valencia, CA), and RT-PCR 
amplicons generated using primer pairs Cor-FW and Cor-RV, N5-PCR1 and N3-PCR1, and 
repSZ-1 and repSZ-3, 5 were sequenced. Reverse transcription procedures were performed with 
Omniscript reverse transcriptase (Qiagen), and PCR with Hotshot TAQ (New England Biolabs, 
Ipswich, MA) with extension at 68°C. 

Nucleotide sequencing. 

Targeted HCoV-NL63 sequences were RT-PCR-amplified from purified vRNA using a 
genome walking strategy (3). Briefly, overlapping primers described by Geng and others 
(GenBank JX524171) and others 4,5 were used to obtain the complete genomic sequence of one of 
the HCoV-NL63 isolates from Haiti (designated HCoV-NL63/Haiti-l/2015). Following the same 
methods, spike, membrane, and nucleocapsid gene sequences were also obtained for HCoV- 
NL63 isolates 2, 3, and 4 from Hiati. As before, 3 AccuScript High Fidelity Reverse Transcriptase 
(Agilent Technologies Inc., Santa Clara, CA) was used for first-strand cDNA synthesis in the 
presence of SUPERase-In RNase inhibitor (Ambion, Austin, TX), and PCR was performed using 
Phusion Polymerase (New England Biolabs) with denaturation steps performed at 98°C. The 3' 
and 5' ends of HCoV-NL63/Haiti-l/2015 were determined using a RACE (rapid amplification of 
cDNA ends) kit (RLM RACE; Ambion) following the manufacturer’s instructions. For all, 
nucleotide sequences were analyzed using an Applied Biosystem 3130 DNA analyzer by using 
BigDye Terminator (v. 3.1) chemistry and the same primers used for amplifications. 

GenBank accession numbers. 

HCoV-NL63/Haiti: KT266906; KX179494- KX179499. HCoV-NL63/Florida: KT381875.1; 
KU521535.1 

Phylogenetic analysis. 

Forty-four complete genome and 148 spike gene region sequences were downloaded from 
NCBI (Supplemental Table 1). Inclusion criteria for the sequences were 1) the sequences were 



published in peer-reviewed journals, 2) no uncertainty regarding the subtype assignment and 
being classified, 3) potential recombinants or poor-quality sequences (i.e., sequences with 
uncertain nucleotide assignments) were excluded, and 4) city/state and sampling time were 
known and clearly established in the original publication. The locations of the reference 
sequences considered for the analysis were Belgium, China, Ghana, Hong Kong, Japan, 

Malaysia, the Netherlands, Sweden, Thailand, and the United States. A multiple sequence 
alignment of the spike gene region, including the 192 references sequences as well as the two 
new sequences from Shands and the four from Haiti, was then obtained with ClustalX 6 followed 
by manual editing using Bioedit. 7 The evolutionary model was chosen as the best-fitting 
nucleotide substitution model in accordance with the results of the hierarchical likelihood ratio 
test implemented with the Modeltest software version 3.7. The full alignment is available from 
the authors on request. 

Likelihood mapping and phylogenetic inference. 

The phylogenetic signal in the aligned sequences was investigated with the likelihood 
mapping method that analyzes every possible group of four sequences, referred to as a 
“quartet.” 9 For each quartet, the likelihood of each one of the three possible unrooted trees is 
estimated, and the three likelihoods are eventually reported as a dot in an equilateral triangle (the 
likelihood map) that is subdivided in to three main areas: the three comers, representing fully 
resolved tree topologies (i.e., the presence of treelike phylogenetic signal in the data); the center, 
representing star-like phylogeny signal (i.e., phylogenetic noise); and the three side areas 
indicating network-like phylogeny (i.e., presence of recombination or conflicting phylogenetic 
signals). Findings from extensive simulation studies suggest that a central area with < 30% dots 
can be used as the criterion of robust phylogenetic signal. The likelihood mapping analysis was 
performed with the program TREE-PUZZLE. 9 The percentage of dots (noise level) falling in the 
central likelihood map was 15.5% indicating sufficient signal to infer a robust phylogeny. 

Bayesian time-scaled phylogeny. 

The evolutionary rate of the Coronavirus NL63 spike region was estimated by calibrating a 
molecular clock using known sequences sampling times with the Bayesian Markov Chain Monte 
Carlo (MCMC) method implemented in BEAST v. 1.8 (http://beast.bio.ed.ac.uk) 10,11 and by 
enforcing either a strict or a relaxed molecular clock with an lognormal distribution for the prior 
rates. Four independent MCMC runs were carried out enforcing a relaxed molecular clock 
(which resulted the best fit for the data, see Results Section) and one of the following coalescent 
priors: constant population size, exponential growth, nonparametric smooth skyride plot 
Gaussian Markov random field, and nonparametric Bayesian skyline plot. 10,12,13 Marginal 
likelihoods estimates for each model were obtained using path sampling and stepping stone 
analyses. 14-16 Uncertainty in the estimates was indicated by 95% high posterior density intervals, 
and the best fitting model for the data set was selected by calculating the Bayes factors (BFs). 15,17 
In practice, any two models were compared to evaluate the strength of evidence against the null 
hypothesis (Ho), defined as the one with the lower marginal likelihood: 2/nBF < 2 indicates no 
evidence against Ho ; 2-6, weak evidence; 6-10, strong evidence; and >10 very strong evidence. 
For each data set, the MCMC sampler was run for at least 50 x 10 6 generations, sampling every 
5,000 generations. Proper mixing of the MCMC was assessed by calculating the effective sample 
size (ESS) of each parameter. Only parameter estimates with ESS > 250 were accepted. 
Phylogeographic analysis was conducted by using the continuous time Markov chain process 



over discrete sampling locations implemented in BEAST 18 with the Bayesian Stochastic Search 
Variable Selection model, which allows diffusion rates to be zero with a positive prior 
probability. The maximum clade credibility tree was selected from the posterior tree distribution 
after a 10% bum-in using Tree Annotator version 1.8 included in the BEAST package. 10 ’ 11 
Statistical support for specific monophyletic clades was assessed by calculating the posterior 
probability. The phylogenetic tree with full sequence labels is included in Supplemental Figure 
1 . 
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Supplemental Figure 1. Maximum Clade Credibility (MCC) tree of HCoV-NL63 spike gene region with 
sequence labels. Branches are colored according to the legend in Figure 2, where each color represents the 
geographic location of the sampled sequence (tip branches), as well as of the ancestral lineage (internal branches) 
inferred by Bayesian phylogeography. For display purposes, the tree is displayed as a polar tree. 


Supplemental Figure 2. Enterovirus D68 in A549 cells. (A) Mock-infected A549 cells, 6 days post-seed. (B) 
A549 cells, 6 days postinoculation with plasma. Infected cells display characteristic cytopathic effect (CPE) at 33°C: 
cell rounding followed by detachment from the growing surface, whereas CPE are not observed at 37°C (not 
shown). Images at an original magnification at x200. 


Supplemental Figure 3. Dengue viruses in Vero E6 cells incubated at 37°C. (A) Mock-infected Vero cells, 8 days 
postseed. (B) Dengue virus 1- infected cells, 8 days postinoculation with plasma. (C) Dengue virus 4-infected cells, 
8 days postinoculation with plasma. The virus infected cells display diffuse cytopathic effect (CPE) including 
darkening of the cytoplasm prior to death (colorless refractile cells). Some infected cells contain one or more large 
vacuoles. Images at an original magnification of x200. 


Supplemental Figure 4. Zika virus in LLC-MK2 cells. (A) Mock-infected LLC-MK2 cells, 9 days post-seed, 
image at original magnification x200. (B) Zika virus -infected cells, 9 days post-inoculation with plasma, image at 
original magnification 200X. (C) Mock-infected LLC-MK2 cells, 9 days post-seed, image at original magnification 
x400. (D) Zika virus -infected cells, 9 days post-inoculation with plasma, image at original magnification x400. Prior 
to cell death, perinuclear vacuoles are present in most cells (white arrows). 



Supplemental Table 1 


Accession number, gene region and country of reference sequences included in this study 


Accession number 

Gene region 

Country 

Tip name 

KT381875 

Complete genome 

United States 

Shands 

KU521535 

Complete genome 

United States 

SHANDS_2 

KT266906 

Complete genome 

Haiti 

HAITI 1 

JX504050 

Complete genome 

United States 

2US04 

JX104161 

Complete genome 

China 

3CN08 

AB695189 

Spike glycoprotein 

Japan 

4JP11 

AB695188 

Spike glycoprotein 

Japan 

5JP11 

AB695187 

Spike glycoprotein 

Japan 

6JP10 

AB695186 

Spike glycoprotein 

Japan 

7JP10 

AB695185 

Spike glycoprotein 

Japan 

8JP10 

AB695184 

Spike glycoprotein 

Japan 

9JP10 

AB695183 

Spike glycoprotein 

Japan 

10JP10 

DQ462792 

Spike glycoprotein 

Netherlands 

11NL03 

DQ462789 

Spike glycoprotein 

Netherlands 

14NL03 

DQ462788 

Spike glycoprotein 

Netherlands 

15NL03 

DQ462787 

Spike glycoprotein 

Netherlands 

16NL03 

DQ462786 

Spike glycoprotein 

Netherlands 

17NL03 

DQ462785 

Spike glycoprotein 

Netherlands 

18NL03 

DQ462784 

Spike glycoprotein 

Netherlands 

19NL04 

DQ462783 

Spike glycoprotein 

Netherlands 

20NL04 

DQ462782 

Spike glycoprotein 

Netherlands 

21NL02 

AY758299 

Spike glycoprotein 

Belgium 

23BE03 

AY758301 

Spike glycoprotein 

Belgium 

24BE03 

AY758300 

Spike glycoprotein 

Belgium 

25BE03 

AY758298 

Spike glycoprotein 

Belgium 

26BE03 

AY758297 

Spike glycoprotein 

Belgium 

27BE03 

GQ856814 

Spike glycoprotein 

Hong Kong 

29HK05 

GQ856812 

Spike glycoprotein 

Hong Kong 

30HK05 

GQ856813 

Spike glycoprotein 

Hong Kong 

31HK05 

GQ856811 

Spike glycoprotein 

Hong Kong 

32HK05 

GQ856810 

Spike glycoprotein 

Hong Kong 

33HK05 

GQ856809 

Spike glycoprotein 

Hong Kong 

34HK06 

GQ856808 

Spike glycoprotein 

Hong Kong 

35HK06 

GQ856807 

Spike glycoprotein 

Hong Kong 

36HK06 

GQ856806 

Spike glycoprotein 

Hong Kong 

37HK06 

GQ856805 

Spike glycoprotein 

Hong Kong 

38HK06 

GQ856804 

Spike glycoprotein 

Hong Kong 

39HK06 

GQ856803 

Spike glycoprotein 

Hong Kong 

40HK06 

GQ856802 

Spike glycoprotein 

Hong Kong 

41HK06 

GQ856801 

Spike glycoprotein 

Hong Kong 

42HK06 

GQ856800 

Spike glycoprotein 

Hong Kong 

43HK06 

GQ856799 

Spike glycoprotein 

Hong Kong 

44HK06 

GQ856798 

Spike glycoprotein 

Hong Kong 

45HK06 

DQ445912 

Complete genome 

Netherlands 

52NL03 

DQ445911 

Complete genome 

Netherlands 

53NL04 

DQ231166 

Spike glycoprotein 

Sweden 

54SE04 

DQ231165 

Spike glycoprotein 

Sweden 

55SE05 

DQ231164 

Spike glycoprotein 

Sweden 

56SE05 

DQ231163 

Spike glycoprotein 

Sweden 

57SE05 

















































































































































































































DQ231162 

Spike glycoprotein 

Sweden 

58SE05 

DQ231161 

Spike glycoprotein 

Sweden 

59SE04 

DQ231160 

Spike glycoprotein 

Sweden 

60SE04 

DQ231159 

Spike glycoprotein 

Sweden 

61SE04 

DQ231158 

Spike glycoprotein 

Sweden 

62SE05 

AY902242 

Spike glycoprotein 

Sweden 

63SE04 

NC_005831 

Complete genome 

Netherlands 

64NL03 

AY567487 

Complete genome 

Netherlands 

65NL03 

KT359913 

Spike glycoprotein 

Malaysia 

66MY13 

KT359912 

Spike glycoprotein 

Malaysia 

67MY13 

KT359911 

Spike glycoprotein 

Malaysia 

68MY13 

KT359910 

Spike glycoprotein 

Malaysia 

69MY13 

KT359909 

Spike glycoprotein 

Malaysia 

70MY12 

KT359908 

Spike glycoprotein 

Malaysia 

71MY12 

KT359907 

Spike glycoprotein 

Malaysia 

72MY12 

KT359906 

Spike glycoprotein 

Malaysia 

73MY12 

KT359905 

Spike glycoprotein 

Malaysia 

74MY12 

KT359904 

Spike glycoprotein 

Malaysia 

75MY12 

KT359903 

Spike glycoprotein 

Malaysia 

76MY12 

KT359902 

Spike glycoprotein 

Malaysia 

77MY12 

KT359901 

Spike glycoprotein 

Malaysia 

78MY12 

KT359900 

Spike glycoprotein 

Malaysia 

79MY12 

KT359899 

Spike glycoprotein 

Malaysia 

80MY12 

KT359898 

Spike glycoprotein 

Malaysia 

81MY12 

KT359897 

Spike glycoprotein 

Malaysia 

82MY12 

KT359896 

Spike glycoprotein 

Malaysia 

83MY12 

KT359895 

Spike glycoprotein 

Malaysia 

84MY12 

KT359894 

Spike glycoprotein 

Malaysia 

85MY12 

KT359893 

Spike glycoprotein 

Malaysia 

86MY12 

KT359892 

Spike glycoprotein 

Malaysia 

87MY12 

KT359891 

Spike glycoprotein 

Malaysia 

88MY12 

KT359890 

Spike glycoprotein 

Malaysia 

89MY12 

KT359889 

Spike glycoprotein 

Malaysia 

90MY12 

KT359888 

Spike glycoprotein 

Malaysia 

91MY12 

KT359887 

Spike glycoprotein 

Malaysia 

92MY12 

KT359886 

Spike glycoprotein 

Malaysia 

93MY12 

KT359885 

Spike glycoprotein 

Malaysia 

94MY12 

KT359884 

Spike glycoprotein 

Malaysia 

95MY12 

KT359883 

Spike glycoprotein 

Malaysia 

96MY12 

KT359882 

Spike glycoprotein 

Malaysia 

97MY12 

KT359881 

Spike glycoprotein 

Malaysia 

98MY12 

KT359880 

Spike glycoprotein 

Malaysia 

99MY12 

KT359879 

Spike glycoprotein 

Malaysia 

100MY12 

KT359878 

Spike glycoprotein 

Malaysia 

101MY12 

KT359877 

Spike glycoprotein 

Malaysia 

102MY12 

KT359876 

Spike glycoprotein 

Malaysia 

103MY12 

KT359875 

Spike glycoprotein 

Malaysia 

104MY12 

KT359874 

Spike glycoprotein 

Malaysia 

105MY12 

KT359873 

Spike glycoprotein 

Malaysia 

106MY12 

KT359872 

Spike glycoprotein 

Malaysia 

107MY12 

KM077093 

Spike glycoprotein 

United States 

108US91 

KM077092 

Spike glycoprotein 

United States 

109US93 

KM077091 

Spike glycoprotein 

United States 

110US95 
























































































































































































































KM077090 

Spike glycoprotein 

United States 

111US97 

KM077089 

Spike glycoprotein 

United States 

112US01 

KM077088 

Spike glycoprotein 

United States 

113US83 

KM077087 

Spike glycoprotein 

United States 

114US87 

KM077086 

Spike glycoprotein 

United States 

115US92 

KM077085 

Spike glycoprotein 

United States 

116US90 

KM077084 

Spike glycoprotein 

United States 

117US89 

KM077083 

Spike glycoprotein 

United States 

118US90 

KM077082 

Spike glycoprotein 

United States 

119US89 

KM077081 

Spike glycoprotein 

United States 

120US90 

KM077080 

Spike glycoprotein 

United States 

121US90 

KM077079 

Spike glycoprotein 

United States 

122US99 

KM077078 

Spike glycoprotein 

United States 

123US01 

KM077077 

Spike glycoprotein 

United States 

124US91 

KM077076 

Spike glycoprotein 

United States 

125US96 

KM077075 

Spike glycoprotein 

United States 

126US96 

KM077074 

Spike glycoprotein 

United States 

127US93 

KM077073 

Spike glycoprotein 

United States 

128US92 

KM077072 

Spike glycoprotein 

United States 

129US91 

KF530114 

Complete genome 

United States 

130US89 

KF530113 

Complete genome 

United States 

131US90 

KF530112 

Complete genome 

United States 

132US01 

KF530111 

Complete genome 

United States 

133US90 

KF530110 

Complete genome 

United States 

134US83 

KF530109 

Complete genome 

United States 

135US90 

KF530108 

Complete genome 

United States 

136US89 

KF530107 

Complete genome 

United States 

137US91 

KF530106 

Complete genome 

United States 

138US87 

KF530105 

Complete genome 

United States 

139US01 

KF530104 

Complete genome 

United States 

140US90 

JQ900259 

Complete genome 

United States 

142US05 

JQ900257 

Complete genome 

United States 

144US09 

JQ900256 

Complete genome 

United States 

145US09 

JQ900255 

Complete genome 

United States 

146US09 

JQ765575 

Complete genome 

United States 

147US05 

JQ765574 

Complete genome 

United States 

148US05 

JQ765573 

Complete genome 

United States 

149US05 

JQ765572 

Complete genome 

United States 

150US05 

JQ765571 

Complete genome 

United States 

151US05 

JQ765570 

Complete genome 

United States 

152US05 

JQ765569 

Complete genome 

United States 

153US05 

JQ765568 

Complete genome 

United States 

154US05 

JQ765567 

Complete genome 

United States 

155US09 

JQ765566 

Complete genome 

United States 

156US08 

JQ765565 

Complete genome 

United States 

157US09 

JQ765564 

Complete genome 

United States 

158US09 

JQ765563 

Complete genome 

United States 

159US09 

JQ771060 

Complete genome 

United States 

160US10 

JQ771059 

Complete genome 

United States 

161US10 

JQ771058 

Complete genome 

United States 

162US10 

JQ771057 

Complete genome 

United States 

163US10 

JQ771056 

Complete genome 

United States 

164US10 

JQ771055 

Complete genome 

United States 

165US10 
























































































































































































































KJ796467 

Spike glycoprotein 

Ghana 

166GH12 

KJ796466 

Spike glycoprotein 

Ghana 

167GH12 

KJ796465 

Spike glycoprotein 

Ghana 

168GH12 

KJ768643 

Spike glycoprotein 

Ghana 

169GH12 

KJ768642 

Spike glycoprotein 

Ghana 

170GH12 

KJ768641 

Spike glycoprotein 

Ghana 

171GH12 

KJ768640 

Spike glycoprotein 

Ghana 

172GH12 

KJ768639 

Spike glycoprotein 

Ghana 

173GH12 

KJ768638 

Spike glycoprotein 

Ghana 

174GH12 

KJ768637 

Spike glycoprotein 

Ghana 

175GH12 

KJ768636 

Spike glycoprotein 

Ghana 

176GH12 

KJ768635 

Spike glycoprotein 

Ghana 

177GH12 

KJ768634 

Spike glycoprotein 

Ghana 

178GH12 

KJ768633 

Spike glycoprotein 

Ghana 

179GH12 

JX513255 

Spike glycoprotein 

Thailand 

180TH10 

JX513253 

Spike glycoprotein 

Thailand 

181TH10 

JX513249 

Spike glycoprotein 

Thailand 

182TH10 

JX524171 

Complete genome 

China 

183CN09 

FJ656174 

Spike glycoprotein 

Sweden 

184SE07 

FJ656173 

Spike glycoprotein 

Sweden 

185SE07 

FJ656172 

Spike glycoprotein 

Sweden 

186SE07 

FJ656171 

Spike glycoprotein 

Sweden 

187SE07 

FJ656170 

Spike glycoprotein 

Sweden 

188SE07 

FJ656169 

Spike glycoprotein 

Sweden 

189SE07 

FJ656168 

Spike glycoprotein 

Sweden 

190SE06 

FJ656167 

Spike glycoprotein 

Sweden 

191SE07 

FJ656166 

Spike glycoprotein 

Sweden 

192SE07 

FJ656165 

Spike glycoprotein 

Sweden 

193SE07 

FJ656164 

Spike glycoprotein 

Sweden 

194SE07 

FJ656163 

Spike glycoprotein 

Sweden 

195SE07 

FJ656162 

Spike glycoprotein 

Sweden 

196SE07 

FJ656161 

Spike glycoprotein 

Sweden 

197SE07 

FJ656160 

Spike glycoprotein 

Sweden 

198SE07 

FJ656159 

Spike glycoprotein 

Sweden 

199SE07 

FJ656158 

Spike glycoprotein 

Sweden 

200SE07 

FJ656157 

Spike glycoprotein 

Sweden 

201SE07 

FJ656156 

Spike glycoprotein 

Sweden 

202SE06 

FJ656155 

Spike glycoprotein 

Sweden 

203SE06 

FJ656154 

Spike glycoprotein 

Sweden 

204SE06 

FJ656153 

Spike glycoprotein 

Sweden 

205SE06 
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